Randomized Linear Algebra

See Yuxin Chen, Randomized linear algebra.

Matrix multiplication

Let $A\in\mathbb{R}^{m\times n}$ and $B\in\mathbb{R}^{n\times p}$. Then,

$$ C=AB=\sum_{i=1}^{n} A_{:,i} B_{i,:}. $$

Assume, for simplicity $m=n=p$.

Idea: approximate $C$ by randomly sampling $r$ rank-one components.

Algorithm: for $l= 1,\cdots ,r$ pick $i_l\in\{1,···,n\}$ i.i.d. with probability $\mathbb{P}\{i_l=k\}=p_k$ and compute

$$M=\sum_{l=1}^r \frac{1}{rp_{i_l}} A_{:,i_l} B_{i_l,:}$$

Rationale: $M$ is an unbiased estimate of $C$,

\begin{align*} \mathbb{E}[M]&=\sum_{l=1}^r \sum_k \mathbb{P}\{i_l=k\} \frac{1}{r p_k} A_{:,k}B_{k,:} =\sum_k A_{:,k}B_{k,:}=C. \end{align*}

Importance sampling porbabilities $p_k$ are

Uniform sampling: $p_k=\displaystyle\frac{1}{n}$
Nonuniform sampling:

$$ p_k=\frac{\|A_{:,k}\|_2 \|B_{k,:}\|_2} {\sum_l \|A_{:,l}\|_2 \|B_{l,:}\|_2}, \tag{1} $$

which is computable in one-pass and requires $O(n)$ memory and $O(n^2)$ operations.

Theorem. [Optimality] $\mathbb{E}[\|M-AB\|_F^2]$ is minimized for $p_k$ given by (1).

Theorem. [Error] Choose $p_k\geq \displaystyle\frac{\beta \|A_{:,k}\|_2 \|B_{k,:}\|_2} {\sum_l \|A_{:,l}\|_2 \|B_{l,:}\|_2}$ for some $0<\beta \leq 1$. If $r\geq \displaystyle\frac{\log n}{\beta}$, then

$$\|M-AB\|_F\leq \sqrt{\frac{\log n}{\beta r}} \|A\|_F \|B\|_F $$

with probability exceeding $1-O(n^{-10})$.



In [1]:

    
using Distributions
using Random
Random.seed!(1345)
using LinearAlgebra
using SparseArrays



In [2]:

    
n=3000
A=rand(n,n)
B=rand(n,n)
C=A*B
β=1.0
log(n)/β









    Out[2]:





8.006367567650246



In [3]:

    
# Uniform
r=400
iᵣ=rand(1:n,r)
p=1/n
@time M=A[:,iᵣ]*B[iᵣ,:]/(r*p)
@time C=A*B;









    



  0.379247 seconds (512.15 k allocations: 180.649 MiB, 8.34% gc time)
  0.708572 seconds (2 allocations: 68.665 MiB)



In [4]:

    
norm(M-C),sqrt(log(n)/(β*r))*norm(A)*norm(B), norm(C)









    Out[4]:





(98375.9760565034, 424507.02816975326, 2.250470537412928e6)



In [5]:

    
# Nonuniform
pA=[norm(view(A,:,k)) for k=1:n]
pB=[norm(view(B,k,:)) for k=1:n]
s=pA⋅pB
p=[pA[k]*pB[k]/s for k=1:n]
sum(p)









    Out[5]:





1.0000000000000002



In [6]:

    
iᵣ=rand(Categorical(p),r);
@time Mₙ=A[:,iᵣ]*inv(Diagonal(r*p[iᵣ]))*B[iᵣ,:]
norm(Mₙ-C),sqrt(log(n)/(β*r))*norm(A)*norm(B), norm(C)









    



  0.347693 seconds (554.18 k allocations: 123.760 MiB, 4.95% gc time)






    Out[6]:





(96919.32727001642, 424507.02816975326, 2.250470537412928e6)



In [7]:

    
# Sparse, nonuniform
n=10000
A=sprand(n,n,0.1)
B=sprand(n,n,0.1)
@time C=A*B
β=1.0
log(n)/β









    



 19.292305 seconds (121.18 k allocations: 1.645 GiB, 0.38% gc time)






    Out[7]:





9.210340371976184



In [8]:

    
# C is full
nnz(C)/prod(size(C))









    Out[8]:





1.0



In [9]:

    
# Nonuniform
pA=[norm(view(A,:,k)) for k=1:n]
pB=[norm(view(B,k,:)) for k=1:n]
s=pA⋅pB
p=[pA[k]*pB[k]/s for k=1:n]
sum(p)









    Out[9]:





1.0



In [10]:

    
r=1000
iᵣ=rand(Categorical(p),r);
@time Mₙ=A[:,iᵣ]*inv(Diagonal(r*p[iᵣ]))*B[iᵣ,:]
norm(Mₙ-C),sqrt(log(n)/(β*r))*norm(A)*norm(B), norm(C)









    



  3.950247 seconds (864.99 k allocations: 1.726 GiB, 0.29% gc time)






    Out[10]:





(104792.86163141944, 319914.7396134604, 252219.24933879502)



In [ ]: